Extraction of Motif Patterns from Protein Sequences Using SVD with Rough K-Means Algorithm
نویسندگان
چکیده
Discovering protein sequence motif information is one of the most crucial tasks in bioinformatics research. In this work, we try to obtain protein recurring patterns which are universally conserved across protein family boundaries. In order to generate higher quality protein sequence motif information from Protein Sequence Culling Server (PISCES) dataset, we tried several different advanced clustering algorithms, such as hierarchical clustering, Self-Organizing Maps (SOM) etc. However, since the dataset itself contains more than 6, 60,000 segments where each segment contains 180 dimensions, any clustering algorithm required more than O(n) complexity is not applicable. Therefore, the very first step of our research is trying to reduce segments. The results suggest that the Singular Value Decomposition (SVD) computing technique is more suits for reducing segments. After that the reduced segments are followed by applying Rough K-Means clustering algorithm. Our experiments indicate that the Rough K-Means algorithm satisfactorily increases the percentage of sequence segments belonging to clusters with high structural similarity than K-Means. The experimental results suggest that the SVD with Rough K-Means algorithm may be applied to other areas of bioinformatics research in order to explore the underlying relationships between data samples more effectively.
منابع مشابه
Protein Sequence Motif Detection using Novel Rough Granular Computing Model
Protein sequence motifs information is essential for the analysis of biologically significant regions. Discovering sequence motifs is a key task to realize the connection of sequences with their structures. Protein sequence motifs have the potential to determine the function and activities of the proteins. Many algorithms or techniques are used to determine motifs which require a predefined fix...
متن کاملExploring Highly Structure Similar Protein Sequence Motifs using SVD with Soft Granular Computing Models
Vital areas in Bioinformatics research is one of the Protein sequence analysis. Protein sequence motifs are determining the structure, function, and activities of the particular protein. The main objective of this paper is to obtain protein sequence motifs which are universally conserved across protein family boundaries. In this research, the input dataset is extremely large. Hence, an efficien...
متن کاملExploring Highly Structure Similar Protein Sequence Motifs using Granular Computing Model based on Adaptive FCM
Protein sequence motifs are very important to the analysis of biologically significant conserved regions to determine the conformation, function and activities of the proteins. These sequence motifs are identified from protein sequence segments generated from large number of protein sequences. All generated sequence segments may not yield potential motif patterns. In this paper, short recurring...
متن کاملExtraction of Motif Patterns from Protein Sequences Using K-Means with segment pruning methods
Bioinformatics is the application of information technology to the management of molecular biological data. Motif finding in protein sequence is one of the most crucial tasks in bioinformatics research. Motifs are identifying as overly recurring sub-patterns in segment of protein sequence biological data. Sequence motifs are verifying by their structural similarities or their functional roles i...
متن کاملSoft Granular Computing Model for Identifying Protein Sequence Motif Based on Svd-entropy Method
Bioinformatics is a field devoted to the interpretation and analysis of biological data using computational techniques. In recent years the study of bioinformatics has grown tremendously due to huge amount of biological information generated by scientific community. Proteins are made up of chain of amino acids. Protein sequence motifs are small fragments of conserved amino acids often associate...
متن کامل